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Abstract 



We present a formulation of naturalness made in the framework of Bayesian 

statistics, which unravels the conceptual problems related to previous approaches. 
Among other things, the relative interpretation of the measure of naturalness turns 
out to be unambiguously established by Jeffreys' scale. Also, the usual sensitivity 
formulation (so-called Barbieri-Giudice measure) appears to be embedded in our 
formulation under an extended form. We derive the general sensitivity formula 
applicable to an arbitrary number of obscrvables. Several consequences and de- 
velopments are further discussed. As a final illustration, we work out the map of 
combined fine-tuning associated to the gauge hierarchy problem and neutralino dark 
matter in a classic supersymmetric model. 



1 Introduction 



The notions of naturalness and fine-tuning are a center of interest in some domains of 
theoretical physics, like the theoretical side of particle physics and cosmology. Loosely 
speaking, these notions refer to the propensity of a model to reproduce the experimental 
observations. When they are employed, their effect is to modify our degree of belief 
in the model examined. Indeed, intuitively, our degree of belief in a model follows its 
propensity to fuUfil experimental constraints. For instance, when the parameters of 
a model should be adjusted very precisely to satisfy a constraint, the model is said to 
suffer from a lack of naturalness, or to have a fine-tuning problem, and this consideration 
typically decreases our degree of belief in the model. 

But these considerations, even if they are taken to be intuitive by a certain fraction 
of people, remain fully subjective and unquantified. To be more precise and eventually 
extract some objective information from these intuitive observations, two things are 
necessary. First, it is necessary to define a consistent measure of naturalness. Second, 
it is also necessary to have a rule telling how the measure of naturalness should be 
mapped to our degree of belief. This second point is important, because the measure of 
naturalness would not be usable if the subjectivity was not under control. 

Several naturalness issues appear in particle physics, with in particular the gauge 
hierarchy problem, the strong CP puzzle, the flavour puzzle, as well as in cosmology, with 
the cosmological constant problem, cosmological coincidence, and the flatness problem. 
The later being resolved by the inflationary theory. We refer to AppjA] for a short 
reminder on those different issues. 

To our knowledge, it is in the context of the gauge hierarchy problem that a mea- 
sure of naturalness~i.e. of fine-tuning-was first built. Indeed, supersymmetric (SUSY) 
models solve the gauge hierarchy problem up to a certain degree, leaving a so-called 
little hierarchy problerr|^ In the seminal papers [l] and [2] , the amount of fine-tuning is 
defined as the sensitivity of the electroweak scale (characterized by the Z boson mass) 
with respect to the model parameters. An ad- hoc formula quantifying the fine-tuning is 
then derived, 

d log m\ 



maxi — — — . (1) 
o log Oi 

In this context, the formula gives a measure of the amount of cancellation between the 
SUSY parameters, which are typically O(TeV), necessary to reproduce the Z boson mass, 
one order of magnitude below. 

This sensitivity measure (often called Barbieri-Giudice measure) is largely exploited 
in the SUSY literature. We refer for example to [3||6] for recent work making use of it. 
However this formulation has also been criticized, either for its limitations, or at the con- 
ceptual level. At the conceptual level, maybe the most straightforward remark is there 
is no rule connecting the sensitivity measure to our degree of belief. The interpretation 
of the numbers provided by Eq. ([T]) is therefore fully subjective. 

Several attempts in the literature have been already made to produce alternative 
definitions, with in particular the papers and |8|. Among other things, the work 
[7] introduces the key notion of probability distribution of parameters, while [s] also 
introduces the important notion of volume of parameter space. Other propositions have 
also been discussed in [9-12 . However, even though all of these alternative propositions 



^The little hierarchy problem comes from the tension remaining between electroweak and TeV scales, 
and is in fact an issue common to a lot of (all?) models of physics beyond the Standard Model. 
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are well motivated and contain interesting ingredients, it is unfortunately still possible to 
find conceptual problems and criticisms. Overall, one may find that all those measures 
of fine-tuning are a bit ad-hoc or lack for a robust framework. 

It is with the will of offering a solid framework to the notion of naturalness that we 
present an approach based on Bayesian statistics. Among other things, the link between 
the naturalness measure and the degree of belief will be established from Jeffreys' scale. 
Also, our approach turns out to contain the sensitivity measure in a generalized form. 
Once embedded in this framework, the usual limitations and problems of the sensitiv- 
ity measure vanish. An attempt in that direction has been done in It is, to our 
knowledge, the only paper containing this idea. 

The article is organized as follows. Naturalness problems, the sensitivity formula- 
tion and its conceptual flaws are reviewed in a generic way in Section [2j Section |3] is 
devoted to basics of Bayesian model comparison relevant for our purpose, such that our 
presentation is self-contained from the point of view of Bayesian statistics. We then ex- 
pose the Bayesian approach to naturalness and its implications in Section [4] Section [5] is 
finally devoted to some application of the results, focusing mainly on the gauge hierarchy 
problem in supersymmetric models. 

2 Fine-tuning, puzzles and sensitivity 

In this section, we discuss in a generic way naturalness problems and the sensitivity 
formulation. The presentation is aimingly transverse, and applicable to any naturalness 
problem. Along these lines, some of the statements might appear weak or lacking of 
solid definitions. These inconsistencies will be highlighted in the last paragraph. The 
critical point of view will be adopted only in this last part. The rest of the section is 
supporting the sensitivity formulation. 

Along the section, we will consider a dimensionless quantity 6 defined in a given 
model M, with parameters 9i. We will assume that this 6 is subject to experimental 
constraints (or any other piece of information exterior to the model). In all generality, 
one can say that a naturalness problem appears when 5 is constrained to values that 
it is not expected to take. In particular, it can be different from 0(1) while it was 
expected to be 0(1), or at the opposite it can be of 0(1) while it was not expected to be 
0(1). For instance, the gauge hierarchy, cosmological constant and strong CP problems 
enter in that first category, with 6 being m^/Mp^, p/^/Mp^ and 9/27r, respectively. The 
universe flatness problem and cosmological coincidence enter in the second category, 
with 6 being p/pc and pa/pm, respectively. However, this splitting into two categories 
is in fact artificial. Indeed, one has always the freedom to transform one in the other by 
redefining 5 — ^ l/((5 — 1). Therefore, in this section, whatever the naturalness problem 
is, we will always choose to define 5 as a number unnaturally smaller than one. 

In all generality, 5 is a function of the model parameters, 6{9i). As we are concerned 
with the values that 6 can potentially take, the dependence with respect to the param- 
eters is crucial. In the limiting case where 6 does not depend at all on the parameters, 
it is completely determined by the model A4. In that case, A4 is totally predictive, or 
in other words, totally natural. In the opposite direction, the more 6 depends on the 
9i, the more one has to adjust precisely these parameters to satisfy the experimental 
constraint, or, in other words, the more M is fine-tuned. 

It is then temptating to define a measure of naturalness by making use of the deriva- 
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tive of 5 with respect to the parameters. An appropriate quantity has to involve the 
logarithm of 6 to measure a relative variation, and the logarithm of 9i to keep the inde- 
pendence with respect to the choice of units, provided that 9i is dimensionful. Therefore 
this measure has to be based on the quantity \dlog5/dlog9i\. Two such measures have 
been provided (see [l|[2|[9|[l3]), in the context of the gauge hierarchy problem and its 
supersymmetric solution: 




Whatever the exact definition is, we will denote this kind of quantity as c. With this 
definition, a fully predictive model has c = 0, and a model requiring infinite fine-tuning 
has c — )• c«. 

However, in between these two extreme cases, one can also identify a particular 
threshold, when c = 1. This is the particular case where 6 is directly an input parameter 
of M. The strong CP puzzle and the fiavour puzzle in the Standard Model, as well as 
flatness of the universe and cosmic coincidence in the Standard Cosmological Model are 
all examples of such a case. Depending on the context and on the opinions, this situation 
is sometimes considered as being a "puzzle", and not a "problem". However, this kind 
of consideration is subjective. It depends ultimately on whether the scientist wishes to 
find a model more natural than the one with c = 1, or if he is satisfied with that one. 
In any case, from the strict point of view of sensitivity, c = 1 appears well as a limit 
between predictivity and fine-tuning. 

Let us propose two toy examples to illustrate the c measure. To explain the smallness 
of 5, one often has to invoke "special cancellations" between the 6i. It is for example 
the case in the gauge hierarchy problem, where cancellations between 0{Mpi) quantum 
contributions needs to occur to obtain m^, or the cosmological constant problem, in 
which cancellations between 0{Mpi) quantum contributions have to occur to make A 
vanish. Let us sketch this by 5 oc 1 — 6*, where is a parameter expected to be 0(1). 
To produce 6 <^ 1, 6 has to be tuned to be close to one. The c measure is then 
\dlog6/dlog9\ = 6/5. c is proportional to 6, which is 0(1), and to 1/5 which grows 
with the precision of cancellation required. This quantity is thus well measuring the 
amount of cancellation necessary to get 5 <^ 1. 

The second toy example is the situation where 5 oc , with 9 still an 0(1) pa- 
rameter. This case of "exponential suppression" appears for example in the Randall- 
Sundrum setup to solve the gauge hierarchy problem, in infiationary theories to explain 
why p/pc — 1 is so small, and also in the dimensional transmutation arising when an 
asymptotically free theory becomes confining in the infrared. In that case, the c mea- 
sure \dlog5/dlog9\ = 9. It does not depend on 5 but only on the order one parameter. 
Comparing c = 9/5 and c = 9, one can see that the "exponential" model is more natural 
by a factor 5 with respect to the "cancellation" model. 

This way of formulating a measure of naturalness using sensitivity seems well jus- 
tified, even if rather ad-hoc. However, taking a closer look, one can identify several 
conceptual flaws, more or less linked together, some of them being already obvious in 
what we write above. 

Firstly, the notion of "expectation" for the value of a quantity, that is used along 
the section, is not rigourously deflned. Even if one tries to express things differently, at 
some point this notion appears and requires a precise deflnition. Secondly, the notion 
of parameter space does not appear in this formulation. It is a bit worrying, as we are 
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concerned with all potential values that 6 could take. These two remarks are particularly 
suggestive of the Bayesian approach which will be presented in the following sections. 
But the third, worse issue is the following: there is no rule telling us how to interpret the 
sensitivity measure in terms of a degree of belief. This holds for the absolute interpre- 
tation of c, and also at the level of the relative interpretation, when one compares two 
different values of c. For example, we said above that c in the exponential toy model is 
enhanced by a factor 5 with respect to the cancellation model. But does it really mean 
that our relative degree of belief between the two models should be given by the value 
6? Or maybe 5^, or Finally, we can notice the freedom of redefinition of 5. For 
instance, if one redefines 6 — t- 6^^^, c is scaled by a factor 100. Given the absence of rule 
to interpret c, this fact does not constitute a problem in itself. Instead, it can be taken 
as a constraint of consistency. That is, it would be good that the interpretation of c 
varies consistently with a redefinition of 6, so that the conclusions remain unchanged. 



3 Bayesian model comparison 

The aspects of Bayesian statistics relevant for our purpose are briefly reviewed in this 



section. For any additional details, we refer the reader to the comprehensive review 14 
and references therein, and the textbook fl5\. 

Within the framework of Bayesian statistics, the notion of probability is defined as a 
measure of the degree of belief about a proposition. On the other hand, one also knows 
that whatever the definition of probability p is, the axioms of probability theory entail 
Bayes' law: 

p{A\B) = p{B\A)^^ , (3) 
which, with any additional true information /, takes the form 

p{A\B,I)=p{B\A,I)P^ . (4) 

This well known result gets a crucial meaning when applied to probability as a degree 
of belief. Indeed, replacing A by any hypothesis H, and B by the known information 
available (called d for "data"), the previous equality becomes 

p[H\d,i)=pmj)^^ ■ (5) 

In Eq. ([5]), p{H\I) is the probability (i.e. the degree of belief) given to the hypothesis 
without taking the data into account, which is called prior probability, or just "prior". 
p{H\d, I) is the probability of the hypothesis once the data is taken into account, called 
posterior probability. One thus sees that the Bayes formula, applied to a piece of infor- 
mation d and a hypothesis tells how our degree of belief in H should be updated in 
the light of d. This is the remaining term p{d\H , I) / p{d\I) which performs this action. 
p{d\H,I) is the probability of obtaining the data assuming that the hypothesis is true. 
But taken as a function of H, this quantity is not a probability anymore, and is called 
a likelihood function, denoted as C{H). It has to be normalized by the constant p{d\I), 
which is called Bayesian evidence. The Bayesian evidence is the sum over all possible 
realizations of H: 

p{d\I) = Y,P{d\H,I)p{H\I) . (6) 
H 
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1 log -Boil 


Odds 


Probability 


Strength of evidence 


< 1.0 


<3:1 


< 0.750 


Inconclusive 


1.0 


~ 3 : 1 


0.750 


Weak evidence 


2.5 


~ 12 : 1 


0.923 


Moderate evidence 


5.0 


~ 150 : 1 


0.993 


Strong evidence 



Table 1: The empirical Jeffreys scale calibrating the odds between model Mo and model Mi. 



Two main applications follow from Eq. ([s]): parameter inference and model com- 
parison. We will be interested in the latter for our purpose. For model comparison, it is 
the Bayesian evidence Eq. ([g]) which will play the main role. 

Let us consider Eq. ([5]), where hypothesis H is "model A4 is true" and there is no 
additional proposition I. The equation becomes 

p{M\d) = p{d\M)^ . (7) 

Applying it to two models (which can be the same model with two different priors), A^o 
and Ail, and eliminating the unknown constant p{d), one obtains the equation 



p{Mo\d) ^ p{d\Mo) pjMo) 
p{Mi\d) p{d\Mi)p{Mi) 

P(-^O) ^ollo^ v^v^^v r.AA= „.V.ilo PjMold) 



(8) 



The quantity ^^j^ is called prior odds, while is called posterior odds. The 

crucial quantity is the ratio of the Bayesian evidences ; denoted as Bqi, and 

called Bayes factor. 

The Bayes factor tells us how the relative degree of belief between two models is 
updated given information d. A Bayes factor larger [smaller] than 1 will favor A^o 
[A^i]. Bayes factors are usually interpreted with respect to Jeffreys' scale [16j, given in 
Table [3} This scale is empirically calibrated, with thresholds at values of the odds of 
3 : 1, 12 : 1 and 150 : 1, representing weak, moderate and strong evidence in favour of 
A4o, respectively. It can also be convenient to consider the logarithm logi?oi- 

Note that for a model with continuous parameters, the Bayesian evidence Eq. Q 
takes an integral form 

p{d\I)= [ p{d\6,M)pie\M) . (9) 



It is then the average of the likelihood function over the parameter space P, weighted 
by the prior density of the parameters within the model p{9\M). 

Bayesian model comparison tells us how the odds between two models should be 
modified by taking into account an external piece of information d. It formalizes two 
competing effects: quality of fit and predictivity. The first one is the usual measure of 
deviation between data and prediction, given by the likelihood function. The second one 
is a principle of economy, i.e. a formalization of Occam's razor. It will enter in the form 
of notion of volume in the parameter space. Roughly speaking, provided that the volume 
of parameter space allowed by the likelihood is smaller than the one allowed by priors 
(i.e. that data is informative), the Bayes factor will favor the model with the smaller 
prior volume. Or in other words, it favors the model which is the more predictive with 
respect to data. This notion of volume is closely related to Fisher information, which, 
in this context, is a measure of the intrinsic amount of information that the likelihood 
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function and priors contain ITt]. For our purposes, we will consider the "observed" 

Fisher information, defined as I{f}{x) = — ^q^^J ■ For example, Fisher information of 

a normal density with variance cr^ is and Fisher information of a uniform density 

over the volume V is l/V"^. In the present work, it will mainly be this second aspect of 
predictivity that will matter. 

To end this section, let us discuss about the prior density p{9\M). The choices 
of both the functional form and the range of the prior density are critical. The range, 
conservatively, should be taken as wide as possible. It can be crucial to have ranges which 
are intrinsically bounded, such that prior volumes remain finite. On the other hand, the 
functional form of the density is often chosen to be the less informative as possible, i.e. 
the more objective. Several approaches, based on Fisher information (Jeffreys prior) or 
the Kullback-Leibler divergence (reference priors, see e.g. |l8]) have been elaborated to 
construct such priors. 

In this work, we will make use of the principle of indifference, which is an approach 
to minimize the amount of subjective information about a problem. This principle states 
that our a priori degree of belief about a problem should be invariant under transforma- 



tions considered as irrelevant for the problem [19 20 . Applied to continuous variables, 
this condition constrains the objective densities and can sometimes fully determine them. 
For example, a change in coordinates x' = x + a should not influence our a prior degree 
of belief on x. This transformation is thus considered as irrelevant. This imposes the 
condition p{x + a) = p{x), which constrains p to be the uniform density. Another impor- 
tant example is the one of a dimensionful quantity, The principle of indifference states 
that our a priori degree of belief p{fj,) should not depend on the choice of units, such 
that //' = an has the same prior as //. This translates as the condition p{n) = ap{an), 
which sets p{n) oc /i^^, called logarithmic prior since fi~^dfi = dlog/x. As a lot of our 
observables and parameters are dimensionful, this logarithmic prior will be omnipresent. 



4 Naturalness in a Bayesian framework 

We present in this section the Bayesian approach to the notion of naturalness. First, let 
us set up the notations. We consider a model M, with a set of dimensionful parameters 
6 = {9i, . . . , On), spanning the parameter space 2? of dimension n. We consider a set of 
m dimensionful observables 0{9) = (Oi, . . . , Om) (with m < n) predicted by this model, 
taking measured value Oex on the subset of the parameter space Dex of dimension n — m. 
Data other than the O measurement are collectively called d, and the likelihood function 
p{0 = Oex\0,-M.) is denoted as Co{0). An amount of precision S is associated to the 
measurement of O. It can be for instance the covariance matrix of a multivariate normal 
law, or it is more generally given by the Fisher information of the likelihood Co (as a 
function of O), I{Co}{0) = 2"^ 

Calling O as an "observable" is somewhat misleading. In fact, it just has to be a 
quantity constrained by experimental data (or any other exterior piece of information). 
Note that compared to the 6 defined in Section [2] we let the O be dimensionful. We 
emphasize that we restrict ourselves to dimensionful observables and parameters for the 
sake of simplicity. The consequence of this choice is to make appear logarithmic priors 
everywhere. However, the whole approach is general to any prior. The generalization 
does not present difficulty, and will be explained in the last subsection. Finally, the 
reason of the restriction m < n is that m > n is similar to m = n from the naturalness 
point of view. This point will be discussed afterwards, in the last subsection. 
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4.1 Probability formulation 



Loosely speaking, naturalness is the propensity of a given model to reproduce the ex- 
perimental observation. Using the notations we adopted, the usual translation of this 
idea is 

• "Sensitivity of O with respect to 9, in the vicinity of a point O^x belonging to T>ex-" 

This leads to the c measure already presented above, 

dlogO 



Ca = maxj 



d log 9i 



or 



Cb 



(10) 



However, an alternative formulation for naturalness, arguably as intuitive as the first 
one, is 

• "Probability of having O = Oex in the model. " 

This is this second formulation which will be our starting point. As it involves a notion 
of probability, it necessarily has a Bayesian character. This formulation is translated as 
the probability 

p{0 = Oex\M,d). (12) 

But we are interested in this quantity as a function of the hypothesis {M , d) . Taken 
as such, it is not a probability, but instead a Bayesian evidence, as defined in Eq. Q. 
Due to the absence of normalization, this evidence alone is not usable. Instead, it has to 
appear inside a Bayes factor. As a measure of naturalness, we therefore have to consider 
a Bayes factor which compares our hypothesis {A4,d) to another hypothesis {Ai',d'), 

p{0 = 0,x\M,d) 

p{0 = Oex\M',d') ■ ^ ' 

This well-defined quantity plays the main role in our approach. 

Two comments are in order. First, it is clear that such a measure of naturalness has 
a relative character. In this framework, comparing the naturalness of two models A4 
and M' is certainly possible, but an absolute statement about the naturalness of M has 
to be done with care. To do so, M.' would have to be defined such that it constitutes 
an absolute reference. How this can be realized will be discuss further in the section. 
Second, we emphasize that the distinction between the model Ai and the data d is 
artificial. Indeed, d could be as well considered as a part of Ai. It just depends on how 
Ai is defined. It is convenient to keep this separation explicit for the discussion, and to 
stress that d and d' need not be identical. 

Let us now specify what are the different options available for (M' , d'). If one takes 
the two pieces of data to be identical d = d' , and apply B to two realistic models 
A4 and A4' , it provides a measure of the relative naturalness between these models. 
In particular, it makes sense to apply B to the same model with two different prior 
densities. For instance, one can compare the naturalness of two different regions of the 
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parameter space T>ex- One can even choose punctual priors, that is priors that select a 
single point belonging to Pex- In that case, B measures the relative naturalness between 
two points of Pex- This brings us back to a local measure of naturalness, just like the c 
measure. Note that the selection of a single point of the parameter space also happens 
if pieces of data ) are sufficiently constraining. A necessary condition for that is to 
have at least as many observables in S- ' as parameters in 

Now, let the two models be identical, M. = A^', and let the pieces of data be 
different, d ^ d! . B indicates this time what is the change in naturalness induced by 
going from data d to data d' . Following the literature, this kind of quantity may be 
dubbed as "naturalness price" or "fine-tuning price" associated to the change of data. 
A recent work along this line is (21) . 

Finally, how can M' be defined such that it constitutes an absolute naturalness ref- 
erence? In Section[2j we already identified the two limiting cases of total predictivity and 
infinite fine-tuning. We also identified a threshold in between, when observables Oi,,,rn 
are input parameters. To define an absolute reference for naturalness, this threshold 
as well as the limit of total predictivity may be employed. This suggests two ways of 
defining a reference model. Either one can consider that Ai' is an ideal, fully natural 
model satisfying O = Oex everywhere in its parameter space. We will denote this ideal 
model as X . Or, may be a hypothetical "puzzle" model, in which the Oi,„m would 
be directly input parameters. This model will be denoted V. We call it a "puzzle" 
model, because from the point of view of sensitivity, it stands at the threshold between 
predictivity and fine-tuning. 

These different possibilities for M' and their implications will be discussed later in 
the section. What we obtain up to now is a well-defined measure of naturalness, un- 
der the form of a Bayes factor. Unlike in other approaches, a mapping (Jeffreys' scale) 
between this measure and our degree of belief exists. The measure is therefore usable, 
and different applications are possible depending on what one defines as being {Ai' ,d'). 
Starting from now, we will continue the development to show that this probability for- 
mulation, instead of being an alternative to the sensitivity formulation, actually embeds 
it. 



4.2 Apparition of a sensitivity measure 

From this point until the end of the section, it is assumed that the measurement of 
O is sufficiently precise, such that one can consider the Laplace approximation of the 
likelihood function. That is, the log-likelihood can be expanded around a maximum 



^ r (Q\ ^ r , d^og Co 

\ogCo{e) ~ log C^ax + g^i^^j 



2! • 



in which the first order derivatives of Cq vanishes since 9max is an extremum. This ex- 
pansion corresponds to approximating the likelihood as a (multivariate) normal law. Let 
us re-write the right-handed term by introducing the Jacobian matrix of the observables 



d"^ \og Co 



2! • ^ ' 



One recognizes in that expression the quantity log Co / dO^dO^ , which up to a minus 
sign is the observed Fisher information associated to the O measurement, I{Co}{0). 



9 



We rescale O by Oex to make appear a dimensionless Jacobian and a dimensionless 
Fisher information associated to O/Oex, such that Eq. (|15|) becomes 



52 log Co J J 

77-J\ogOik'JlogOjl 



aiogo^aiog 



^max ) ( ^ ^max ) (16) 



The dimensionless Fisher information — 9^ log£c'/51ogC?*01ogC'-' describes the amount 
of relative uncertainty associated to O. We will denote it as from now on. 

Given this expansion of Co, we can reconsider our central quantity, the Bayesian 
evidence p{0 = Oex\-M.,d). This evidence can be written as a continuous sum over all 
the values of the parameters: 

p{o = Oex\M,d) = [ Co{e)p{e)de . (17) 

Jv 

It is the average of Co{0) weighted by the prior density of the model parameters 
p{6) = p[9\M, d). We will denote the Fisher information associated to this prior density 
I{p{e)] = \V\-^, and designate \V\^/'^ as the "prior volume" . 



Provided that the likelihood is informative with respect to the prior, Eq. (17) takes 
the form 

p{0 = Oex\M, d) = Cmax^-\j. / -d<j{e) . (18) 

Here, da{0) is the induced integration measure on the manifold T^ex, and C is the Jaco- 
bian factor, 

C= |det(JiogoJfogo)|'^' . (19) 

From the point of view of Fisher information, C measures how much information about 
the parameters is contained in O /O^x regardless of the uncertainty S. The interesting 
fact is that C is a generalized version of the sensitivity measure c, such that Eq. (18) 
makes the link between the two formulations of naturalness. 



Some remarks are in order. Firstly, Eq. (18) holds in the limit where C|y| ' ^ 
over Vex- We will designate the likelihood as informative when this condition is 
fulfilled. When the condition is not satisfied, the overlap between the likelihood and the 
prior has to be taken into account properly, and the Bayesian evidence tends to Cmax- 
Secondly, we emphasize that J\ogoJ\ogO indeed a m x m matrix. Its size does not 
depend on the number of parameters, but on the number of observables. Thirdly, by 
choosing a punctual prior, or if the other data d are sufficiently constraining, V^x is 
reduced to a single point and the integral C~^da{6) is reduced to C~^|g^. In 
such a situation, we get closer from the sensitivity definition, which is a local measure. 

We also emphasize that the derivatives which appear within C depend on the choice 
of prior. Indeed, in all generality, these derivatives are performed with respect to the 
"prior repartition function" G{6), defined such that p{6)d6 = dG{9). Let us illustrate 
this fact by considering a single observable and a flat prior on all parameters, restricted 
to the volume [ai, 6i] x . . . x [a„, The prior volume is IFI"*^/^ = (5i— ai) x . . . x (6n— On), 

and the Jacobian factor is C = \J^~\d\ogoJd6i^ . Instead, if one chooses a logarithmic 

prior for all parameters p(0j) oc 0^"^, the prior volume becomes = (log 6i — log oi) x 

... X (log6„ — loga„), and the Jacobian factor becomes C = y^'Si |91ogO/(91og 9i\^. 
The derivatives in C are then made with respect to log^ instead of 6. In the case of 
dimensionful parameters, the choice of the logarithmic prior has a particular meaning, 
because it is the more objective prior. 
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Through these several remarks, we can finally state that C~^d(j{9) reduces to 
the c measure of the sensitivity formulation, provided that one considers a single observ- 
able, a single point in T>ex, and that one gives logarithmic priors to the parameters. It is 



more precisely the expression Cb, used in 13 , which is exactly reproduced. The measure 



Ca is an approximation of Cf, when one of the component of the gradient dominates. 

Interestingly, the average C^^da{9) has been proposed in |7|, in an attempt to 
normalize the c measure. In our approach this quantity arises naturally, and we also see 
that in itself it does not help to interpret the c measure. On the other hand, the use of the 
volume of parameter space has been proposed in [s] , in an attempt to build an alternative 
measure. These different ideas, somewhat intuitive as such, become rigorously usable 



once they appear together through Eq. ( 18 ). Our approach also justifies Bayesian studies 
which introduce as a "naturalness prior". This is not new, it was already explained 
in [5]. However, we add that the prior of the parameters has to correspond to the 
derivatives made in C, to keep the approach consistent. 

The factor C is a generalization of the c measure. Among other things, it tells 
us what is the information content of several, possibly correlated observables. Let us 
consider the case of two observables. In this case, C is nothing but the norm of the 
wedge product of the gradients, 

C = II Vlog Oi A Vlog ©211, (20) 

which is also 

C = (llVlogOif llVlogOsf - (Vlog Oi. Vlog 02)^)^^^ . (21) 



It is instructive to discuss the behaviour of C depending on the correlation between the 
two observables which is induced by the model. One can rewrite C as 



C = CiC2^Jl- , (22) 

where Ci and C2 are the one-dimensional sensitivities and p is the correlation in the 
model, defined by 

^ iVlogOi.VlogOal 
^ llVlogOill llVlogOair 

If the observables are independently predicted, the two gradients are orthogonal 



and thus p = Q. Eq. 21 reduces in that case to the product of the one-dimensional C 
measures. On the contrary, if the observables are correlated within the model, one has 
/9 > and C decreases. In the Bayesian point of view, this should be interpreted as 
the fact that it is more economical for a model to predict correlated observables than 



independent observables. One may be worried that Eq. (21 ) tends to zero in the limit of 
total correlation, when the two observables are linearly dependent. However, the formula 
does not apply in that limit. Indeed, recall that the condition for having informative 
data is C|y|^/^ » 1^1^''^- It translates here as an upper bound on the correlation p, 



\V\{CiC: 



For instance, for a pair of Gaussian measurements, one has |5]|^/^ = (T1CJ21/1 — p% 



exp' 



When Eq. ( 24 ) is not satisfied, the correlation is too large, such that the two observables 



are not separately informative. That is, instead of two constraints, the model effectively 
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feels a single constraint = 01 ~oc O2. Instead of Eq. (21 ), it is then a one dimensional 
sensitivity associated to O which appears in the Bayesian evidence Eq. (18). This dis- 



cussion illustrates also the fact that C taken outside of the formula Eq. ( 18 ) can induce 
misunderstandings, and has to be interpreted with care. 

At this point, puzzling observations can be made about priors and the meaning of 
logarithms which appear everywhere. What is after all the reason for having log O in 
C? Is it for the sake of making C dimensionless? Or is it for the sake of measuring 
a relative variation, as we naively stated in Section [2]? Here we assumed that O is 
dimensionful. By doing so, we avoid this discussion in a first time, since rescaling O by 
Oex makes the C both dimensionless and measuring a relative variation. Also, although 
the log in dlogO is suggestive of an objective prior, this remains just a way of writing 
do /Oex- And, anyway, speaking about a prior for O actually does not makes sense for 
the moment, as O is determined by the parameters. These observations will be resolved 
when examining the Bayes factor Bj^-p in the next subsection. 

To summarize, we find that the sensitivity formulation of naturalness turns out to 
be embedded in the probability formulation. The c measure, Eq. (11), turns out to 
be a particular case of the factor C, arising in the Bayesian evidence Eq. (18). The 



only assumption made to obtain this result is the Laplace approximation, i.e. that the 
likelihood function can be reduced to a normal law. We will now examine the different 
Bayes factors B that can be constructed, and what is the role taken by the C measure. 



4.3 The different versions of B 
4.3.1 Bmx 

As a warm-up, let us examine the Bayes factor comparing a model M to the fully natural 
model X, 

_ p{0 = Oex\M) 

- p{o = Oex\x) ■ 

By definition, X satisfies O = Oex in all its parameter space. The evidence of this ideal 
model is thus p{0 = Oex\X) = Cmax- Recall that Cmax is an overall normalization 
constant, which will be canceled once we consider the ratio of evidences. Assuming that 
the data O = Oex is informative for M., Bj^x takes the form 

Bmx = |S|V2/|y|i/2 I c~'da{e) . (26) 
Clearly, since A4 cannot be more natural than X, B^ix cannot be larger than one. 



At most, it can tend to one, if Ai tends to be an ideal model like X. Equation (26) 



is not valid in this limit, as it implies that data is not informative for Ai anymore. 



Let us interpret what Eq. (26) is telling us as a Bayes factor. One sees that Bj^x 
decreases with It is because when the constraint O = Oex becomes more precise, 
M. is penalized, but not X. Also, Bj^x decreases as \ V\ increases, because it penalizes 
the waste of parameter space of M. excluded by O = Oex- Finally, Bj^x also decreases 
with the sensitivity C. C measures the amount of information that O carries about the 
parameters of M. The largest C is, the more O contains information, and the more the 
constraint O = Oex is strong for A^, regardless of the experimental uncertainty. 

Bmx is certainly useful to understand the content of the Bayesian evidence. On the 
other hand, it is a priori not very useful in a concrete application, as it will just tell us 
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that the model under consideration is less good than the ideal model. Does it provide 
a good basis to give an absolute interpretation to C? The interpretation of C would 
inevitably depend on It would be necessary that |S| be intrinsically bounded from 
below, independently on the details of the experimental observations. This can actually 
happen, for instance in quantum theories when observables do not commute. But, as 
far as we know, nothing of this kind happens in a domain of physics having naturalness 
issues. 



4.3.2 Bj\/{-p 

Consider now the Bayes factor comparing the model to a model V in which the 
observables Oi...m (or any linear combination of them) are directly input parameters, 
such that C = 1. It is defined by 

We recall that the model V is dubbed as "puzzle", since from the point of view of 
sensitivity, it is at the limit between predictivity and fine-tuning. In this case, the prior 
density for O will enter in the game, as O itself is an input parameter of V. 

The Bayesian evidence of V is 

piO = OeAV) = Cma.^^^ ■ (28) 

|yo|V2 is the prior volume associated to the parameter O. In this expression, one 
can introduce the ratio O/Oex, as we did for the Bayesian evidence of Ai, p{0 = 
Oex\-^)) given in Eq. ( |18| ). By doing so, S is the same relative uncertainty S = 
—d"^ log Co / dlogO'^ dlogO^ as the one which appears in p{0 = Oex\M). The two 
thus cancel in Bj^-p, such that 

^-^p=]^X ^^^^ 

With this choice, C oc dlogO/d . . ., and the prior volume Vq is dimensionless. Vq is 
however not determined. To do so, the prior of O would need to be specified. 

To go further and specify a particular prior for Vq, it is necessary to impose a 
condition, referring to some principle. Interestingly, there are two different principles, 
leading to two different conditions, which lead to the same result. Firstly, one can invoke 
the principle of indifference-introduced in Section |3j Applied to a dimensionful quantity, 
it states that our a priori degree of belief should not depend on the unit scale. This is 
translated as the invariance of p{0\'P) under the transformation 0—7-0x6, which 
imposes the logarithmic prior p{0\V) oc 0~^. 

But there is a second principle which gives the same result. In this section, we 
restricted our discussion to a dimensionful O. However, there is no specification made 
about the actual dimension of O. It seems legitimate to require that the whole approach 
leads to the same outcome whatever the dimension of O is. Said differently, we require 
that the measure of naturalness should not depend on a redefinition of O changing its 
dimension. We will design this property as "consistency" of the naturalness measure. 
It is translated as the invariance of Bj^p under the transformation O — >• O". The 
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consequence of imposing this condition is once again that ^(017^) be the logarithmic 
prior ^(017^) oc O'^ 

We thus find that the logarithmic prior is independently motivated by the principle 
of indifference and by the consistency of the measure. This consistency condition is 
a kind of principle of indifference, applied to a Bayes factor instead of a probability. 
Depending on the point of view adopted, one can either claim that the consistency of 
the measure leads automatically to an objective prior, or that the principle of indifference 
leads automatically to a consistent measure. In any case, Bj^'p is finally invariant under 
the transformation O — t- 6 x O". 

Provided that prior volumes of both M. and V can be bounded, Bj^-p provides a 
kind of absolute scale to C. As expected, the "puzzle" model V plays the role of a 
reference in terms of sensitivity, to which 7W can be compared. Once the volumes are 
determined, C is directly related to Jeffreys' scale. The interpretation of C in terms 
of degree of belief does not depend on the definition of O. Indeed, any redefinition of 
O is accompanied with a change in |Vc)|^/^ = / dlogO, such that the interpretation of 
C remains always the same. For instance, in the gauge hierarchy problem, it does not 
matter anymore to take O = mz or O = m^, because this is compensated in the prior 
volume of m^, J dlogmz — )• / dlogm^. This consistency property resolves one of the 
issues raised in Section [2l 



4.3.3 Relative naturalness 

Finally, let us compare two hypothesis (A^O) ^^o) and {M.i,di). We make the assumption 
that the piece of data O = Oex is informative for both models. The Bayes factor 

_ p{0 = Oe.\Mo,do) 



takes the form 



^oi = ^U/" C,'da{0)([ C,-'da{9)) . (31) 

Here, one can see clearly that this naturalness measure puts in balance both the prior 
volumes, and the sensitivities integrated over the parameter space. A^o and M.\ can be 
two different models, or the same model with different priors, or associated to different 
data do and d\. When the two models are the same, one has Co = Ci, but the two 
domains of integration are still different. If the two hypothesis differ only by the data 
do ^ di, B gives the "naturalness price" between the two pieces of data. For example, 
do could be the pre-LHC constraint on SUSY particle masses, and di the constraint once 
LHC measurements are taken into account. 

Within a same model Ai, one can make the choice of punctual priors, which select 
two different points ^Oj &i of I'ex- In that case, the prior volumes cancel, leaving only 
the Bayes factor 

i^oi = ^ . (32) 

The Bayes factor, then, is simply reduced to the comparison of the sensitivities. This 
quantity shows clearly that the relative sensitivity within a model has to be interpreted 
on the basis of Jeffreys' scale. It is also true for the usual c measure, which is a particular 
case of C. This finishes resolving all the issues raised in Section [2j 
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4.4 Generalization and comments 

To sum up, the Bayes factor Bj^-p provides a handle on the absolute interpretation 
of C, and sets the functional form of all quantities, once either the consistency or the 
indifference principle are required. The relative versions of B contain various possibilities 
of applications, some of them being reminiscent of previous work done in the literature. 
We will now discuss about generalization, absolute naturalness and some implications 
of this approach. 

4.4.1 The general case 

Before all, let us explain why we keep a number m of observables Oi,,,ra smaller or equal 
to the number of parameters n. If one has n observables, non-proportional to each other, 
the model is fully constrained, i.e. V^x has dimension zero. The likelihood function gets 
in that case one or several maxima, with an uncertainty S associated to each of them. If 
one adds a new constraint, the effect will be to increase the precision, i.e. reduce S, and 
possibly decrease the maximum of likelihood, if this new constraint is not in agreement 
with the n previous. But from the point of view of the C measure, this new constraint 
is necessarily reduced, around the maximum, to a linear combination of the n others. 
For that reason, the contribution of this new constraint vanishes in the determinant 
contained in the Jacobian factor C, and thus cannot influence the sensitivity. Therefore, 
for the purpose of the naturalness study, it is sufficient to keep m < n. 

The second assumption we made in the beginning of Sect. |4]was that our observ- 
ables and parameters were dimensionful. We found that either applying the indifference 
principle or requiring consistency of the naturalness measure leads to the invariance of 
p{0) under logO — )• logO -|- b and of Bj^p under logO — )• a x logO -|- b, where a is 
a m X m matrix and b a m-vector. These conditions imply the use of the logarith- 
mic prior, such that C = | det(JiogC) Jj^g^)!^/^, where J\ogO = dlogOi/dlogOj, and 

= J d^logOj, V = j (F-\og9j. All these properties are the consequences of consid- 
ering that a transformation law, the change in unit scale, is irrelevant for our degree of 
belief about the problem. Let us now go to the general case, by considering arbitrary, 
possibly dimensionless, observables and parameters. All the results can be generalized, 
provided the existence of an irrelevant transformation. Let us assume that the transfor- 
mation G{0) —7- G{0) + b and the transformation H{6) — )• H{0) -|- c do not modify our 
degree of belief. Then, the naturalness measure is invariant under G{0) — t- a x G{0) + b, 
the sensitivity takes the form C = \ det( Jg'(c)) J^j-^pl"'^''^, where Jg{o) = dG{0)i/dH{6)j, 
and the prior volumes are Vo = f d^G{Oi), V = f (PH{6j). 

What we stated above is based on the existence of a continuous irrelevant transfor- 
mation. However, other kinds of conditions, possibly less obvious, can also be found. 
For instance, when a theory is isomorphic to itself under a duality transformation, it is 
possible to find the objective priors of parameters transforming non trivially under the 
duality. 

Finally, it is important to recall that results obtained in Bayesian statistics depend 
to some extent on the parametrization of the problem. The choice of parametrization is 
somehow intricate with the choice of prior for the parameters. The indifference principle 
(Sect. |3]) plays a crucial role with respect to this issue. It allows us to minimize the 
amount of information contained in the priors, or, said differently, it helps to find a 
preferred, objective parametrization. For example, it happens that a dimensionless 
parameter, whose objective prior is unknown, can be seen as a ratio of two dimensionful 
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parameters. This is for instance the case of tan/3 = Vu/vd in the MSSM. Given that 
the objective prior of dimensionful parameters is known, this provides the (non-trivial) 
objective prior of the dimensionless parameter. Or equivalently, one can choose these 
dimensionful parameters as input (we refer to [28| for an application to the MSSM). 

By construction, our Bayesian approach to naturalness inherits all of these features. 
However, an extra subtlety is that there is both a freedom of parameterization on the 
parameters Oi and the observable O. Without referring to the indifference principle, 
there is no mean to favor a particular parametrization, and the naturalness measure is 
dependent on this parametrization. Once applied, the indifference principle provides the 
objective priors for both the 6i and O (as discussed in the analysis of Bj^-p). As a result, 
we end up with a unique naturalness measure, depending only on the transformations 
properties associated with the indifference principle. The general result is given above 
in this section. Speaking more formally, the indifference principle defines an equivalence 
class among the parametrizations, and the naturalness measure turns out to be an in- 
variant of this equivalence class. In the usual example of dimensionful parameters, the 
transformation definining the equivalence class is O — )• 6 x O"". This is nothing but the 
2d set of quantities of arbitrary dimensions. As a consequence, whatever the dimension 
of O, e.g. O = mz, or 3 x m^^, the naturalness measure remains the same. 



4.4.2 About absolute naturalness 



The 'puzzle' model V provides a reference in terms of naturalness. It is the only sensible 
reference we are able to find. But how can it be defined in practice ? Let us try to do 
this for the gauge-hierarchy problem. 

The Bayes factor associated to this problem is 

p{mz = mzex\M,d) 
p{mz = mzex\V,d') 

The pieces of data d and d' have to be identical, as our goal is not to compare different 
data. Which information is contained in d ? By construction, in our approach, all 
experimental information available is splitted into two categories. There is the one which 
contributes to indicate what the electroweak scale is, which is called mz = rnzex, and 
the one which doesn't, which is called d. With only the knowledge d, one would know for 
example the strength of gravity and gauge interactions, the fermion and hadron masses, 
but not the electroweak boson masses neither the Fermi constant. Such a situation is 
of course impossible to imagine in practice, but here we are simply splitting a set of 
existing information, regardless of the way they were obtained. 

Now, what should V he? It is a model which predicts data d and has mz both as an 
input and an output. We can imagine that it is a kind of quantum field theory in which 
the weak scale does not receive any quadratic corrections, for some unknown reason. 
What should be the prior volume of mz ? We know both from the indifference principle 
and consistency of the measure that mz should have a logarithmic prior. The bounds 
of this density remain to be found. Given that d contains the knowledge of gravity, V 
has a cutoff at the Planck mass, so mz < Mpi. On the other hand, as d contains the 
quark masses, mz is bounded from below due to unitarity of quark scattering by weak 
currents (see [24|), which implies roughly mz ^ lOGeV. The prior volume Vmz in the 
model V is therefore Vmz — log(Mpi/10 GeV) ^ 40.0. This completes the definition of 



16 



V. Using Laplace approximation, the Bayes factor is 



where Cmz oc d\ogmz/d . . . , and |V| is the prior volume of M. With this equation, 
for any choice of Ai, J"^ C~^^da{9) is equal to Jeffrey's scale up to a known constant. 
Therefore we get the absolute interpretation of the sensitivity measure. 

One may or not be satisfied with this approach. In any case, it illustrates that it is 
not so obvious to define V in practice. This, however, does not take away the general 
results obtained by studying Bmv- 



4.4.3 Second order fine-tuning 

When considering a naturalness map, the following interrogation often appears. The 
interest of a naturalness map is to select regions of the parameter space which have a 
relatively low fine-tuning. But suppose that a very tiny region of the parameter space 
has a very small C, while C is sensibly larger around, in at least one direction. Selecting 
this tiny region and discarding the zone around would be itself an action of fine-tuning! 
We will design that issue as a "second order fine-tuning" . How is this taken into account 
in our framework? 

It is easy to guess that there is a relation to the choice of punctual priors, which select 
single points of Pea;- Indeed, if 1/C was integrated around the tiny zone with small C, 
the particularity of that zone would disappear. Formally, the action of selecting regions 
with low fine-tuning corresponds to impose a prior such that C < Cmin + , where AC 
is a level of tolerance, and Cmin is a minimal value. One can construct a Bayes factor 
comparing two regions Vq^x-, T^iex of the parameter space, containing the minima Comim 
Ci mim respectively, and with the requirement of an upper bound on C Several versions 
can be built, depending for example whether Cmin is considered as a common value, 
or if Cmin = Ciflmin respectively. These versions corresponds to different reasonings. 
Provided that C~^ can be approximated over the domain considered, such Bayes factors 
can be computed analytically. 

For example, let us consider the Bayes factor comparing two regions T>Qex, T^iex of 
the parameter space, containing minima Co mm; Cimin which are not on the boundaries. 
We impose the condition C < Cmin + '^C, where Cmin is a common value. It can be 
min(Co, min) C*!, mm) Or a smaller value. It does not matter, since it will not appear in 
the final result. These two domains are denoted as T^'oexi ^lex- '^^^ Bayes factor reads 



Boi = / C-^da{e) / C-^da{9) . (35) 




When AC is not too large, one can take the Laplace approximation of C around Comin 
and Cimin- In that limit, the integration can be done, and one obtains the Hessian of 
logC 

F=det(V,V,logC)|c_ . (36) 

As by assumption, both boundaries dV'-^^ are inside the corresponding boundaries 
dViex, the Bayes factor reduces to 

Co min Hq' 
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the quantity which renders account for the second order fine-tuning. 



We can see that a new factor 1^^^ appears in addition to Cimin/Comin- This is 



More generally, for Bayes factors where the minima are on the boundaries, there 
will be contributions of the form 



^01 oc J] 



dC 








89, 




dC 


C*0 min 



(38) 



coming in. Terms in Eqs (37), (38) provide a comparison of the steepness of C around 
the two minima. This is properly quantified and interpreted in terms of naturalness with 
the Bayesian approach. 



4.4.4 The top Yukawa in the gauge hierarchy problem 

A recurring question about the gauge hierarchy problem is whether or not the top quark 
Yukawa coupling yt should be considered as an input parameter, such that the derivative 
dmz/dyt appear in the Cmz nieasure. On one hand, one can think of yt as a simple 
constant, and not a parameter. In that case, it should not appear in Cmz- other 
hand, one can think of it as an input parameter, fixed by the experiment. In that case, 
yt must appear in the Cmz measure. So what is the right point of view? Surprisingly, 
it is the first proposition which makes sense. To understand this, we have to examine 
more carefully the second proposition. 

Indeed, the choice of considering yt as an input parameter or a constant is just a 
matter of viewpoint, and should not modify the information content of our study. This 
implies that if yt is taken as an input parameter, one has to add to the set of experimental 
constraints the top quark mass measurement, rrit = rritex- But the observables mz and 
mt are not independent in the model. Therefore, to study naturalness of the gauge- 
hierarchy problem, they need to be simultaneously taken into account. It is thus the 
combined sensitivity Cmz,mt which has to be used when yt is seen as an input parameter. 

At this point, it is instructive to wonder what is the common fine-tuning associated 
to a generic observable O and an observable which is directly an input parameter. 
The set of the input parameters is denoted as pi = {9j,Q). We assume a logarithmic 
prior for all quantities for concreteness. If O and @ are independent in the model, the 
common sensitivity 

aiogO dQ 

A 



Co,e 

factorizes and reduces to 



Co 



d log Pi d log Pi 
dlogO 



(39) 



(40) 



dlogt 

given that Ce = 1- But what happens when the two observables are correlated? It turns 
out that the answer is the same. Whatever the "puzzle" observable is, the sensitivity 
reduces always to Co,e = Co- We emphasize that, although all priors are chosen to be 
logarithmic there, these kinds of results hold whatever the priors are. 

Let us come back to the top Yukawa and the Cmz,mt measure. The previous remark 
does not apply directly, because the observable is not yt, but rather the top mass mt = 
yt X V. Thus yt does not play the same role as 0. However, the outcome will in fact be 
the same. We denote the set of input parameters as pi = {6j, yt). The objective prior of 
mt is logarithmic, and it implies that the prior of yt is also logarithmic. We also assume 
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for simplicity that the priors of the 9j are logarithmic. The sensitivity is then 



'mz,mt 



d log mz ^ dlog{vyt 
d log Pi d log Pi 



(41) 



As mz is directly related to v, the gradients dlogmz/dlogpi and 91ogf/91ogpi are 
colinear. The v contribution therefore vanishes in the sensitivity measure. The remaining 
part contains yt, which plays the same role as Q in the previous paragraph. As a 
consequence the sensitivity reduces to C^z, without the parameter yt, 



a 



mz,mt 



d log mz 



d log Oi 



(42) 



This result holds whatever the priors are. Thus, to reply to the initial question, the 
second proposition gives in fact the same result as the first proposition, after a careful 
examination: yt should not appear in Cmz- 



4.4.5 Consequences of LHC searches 

The existence of a scalar resonance whose properties are roughly compatible with the one 
of a Higgs boson has been established beyond reasonable doubt at the LHC p2|[23] . The 
mass of this new state is a a stringent constraint on many models of new physics. Some of 
them are almost excluded, baring some very specific choices of parameters. As a result, 
this constitutes a new naturalness problem associated to the Higgs mass constraint. It 
would be therefore particularly appropriate to study the fine-tuning related to the Higgs, 
either inside the parameter space of a model, or comparing two different models. The 
naturalness measure to use for such study is 

p{mh = mhex.mz = mzex)\Mo) 
= p{m, = m,,.,mz = mz..)\Mr) ' ^"'^ 

We emphasize once again that the two observables m^ and mz, both independently 
responsible of some amount of fine-tuning, should not be treated separately, because 
their predictions are correlated in the models. 

On the other hand, searches at the LHC and other experiments do not have, up 
to now, shown conclusive evidence of existence of Beyond Standard Model phyics. As 
the idea of new physics (NP) at the multiTeV scale is in part motivated by the gauge- 
hierarchy problem, one can wonder to which extent NP models are more natural than the 
Standard Model, given the increasing exclusion limits. Let us answer to this question 
in a very simplified way. For concreteness, we assume the SM to be valid up to the 
Planck scale. As the origin of the gauge-hierarchy problem is an issue of cancellations 
between square mass parameters, we will consider a one-parameter "model" embedding 
this property. That is, we just define the EW scale as given by = -/Vfp;(l — 5). We 
also consider a BSM model suppressing the quadratic corrections to the EW scale at a 
scale M < Mpi. The EW scale is thus given by m\ = M^(l — 5) in this model. Picking 
similar priors for the 5 in each hypothesis, the naturalness measure turns out to be 

Bnp,sai ~ -j^ ■ (44) 

We can see that, unless M is close from Mpi, this ratio indicates an extremely strong 
fine-tuning of the SM, far beyond the 150 typical value indicated in Jeffrey's scale. If 
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for instance M ^ 100 TeV, one gets Bj^p^sM ~ 10 . We emphasize that, for the sake of 
comparing the SM to a NP model improving substantiahy the gauge-hierarchy problem, 
there is no need to set up a more evolved analysis. This estimation embeds the large 
leading contribution, which flushes away any other subleading effects. 

5 Gauge hierarchy problem and neutraUno dark matter in 
the cMSSM 

In this section, we apply our results to a concrete problem. We choose to study the 
naturalness of a classic supersymmetric model, the constrained MSSM (cMSSM), taking 
into account both the gauge hierarchy problem and the fine-tuning of neutralino dark 
matter. 

Supersymmetry solves the gauge hierarchy problem by embedding the Standard 
Model fields into supermultiplets, which do not generate quadratic corrections to the 
Higgs mass. The simplest realistic model this one can build is called the Minimal Su- 
persymmetric Standard Model (MSSM). But the superparticles which accompany the 
SM particles in the supermultiplets are experimentally constrained to be heavier than 
their standard partner. This implies that supersymmetry has to be broken. However, 
with broken SUSY, the gauge-hierarchy problem is not completely solved. Instead, it 
remains in the form of a certain amount of special cancellations between the SUSY pa- 
rameters, of typical scale MsusY, necessary to reproduce the Z boson mass. MsusY 
is constrained both through direct and indirect observations, and the LHC experiments 
are currently improving these direct limits (see e.g. summary plots of Atlas |26| and 
CMS [27]). Roughly speaking, MsjjSY is at least O(TeV), one order of magnitude above 
the Z mass mz ~ 91 GeV. 

One of the simplest and widely studied version of the MSSM with broken SUSY is 
the constrained MSSM (cMSSM). The parameters of that model are a common gaugino 
mass nil ^ common scalar mass tti-q, a common scalar trilinear coupling ^0 = o,ij/yij, 
the ratio of the two Higgs vevs tan/3 = (Hu) / {Hd), and the sign of the SUSY Higgs 
mass term, sign(^) (see e.g. [^25] for an introduction to SUSY models). However, this 
is a setup which already takes into account mz = mzex- As we are interested in the 
fine-tuning induced by mz = mzex, this constraint must not be incorporated in the 
model in the first place. Therefore a new input parameter has to be introduced. It 
is in fact interesting to trade tan /3 for the dimensionful parameters fi and Bf^ of the 
Higgs sector. Indeed, whereas it is not obvious to find an objective density for tan f3, 
the objective densities of /x and B^ are clearly logarithmic. In practice, it is the former 
parametrization of the model which is used. In that case, the objective prior of tan/3 
has to be inferred from the priors of /i and B^. This remark was made in the paper [28| , 
where the resulting density is called "REWSB prior" . 

Also, the MSSM has another celebrated feature. Its mass spectrum contains the 
neutralino, a fermion charged only under the weak force, and which is a mixture of neu- 
tral Higgsinos and gauginos. If the lightest neutralino Xi is the lightest particle of the 
SUSY spectrum, and if a remnant of the ^7(1)_r symmetry of the SUSY algebra is still 
present, it cannot decay directly into SM particles and is therefore stable. Such a particle 
is a good dark matter candidate. Under the assumptions that the Cosmological Stan- 
dard Model is valid in the early universe, and that the neutralinos were at the thermal 
equilibrium for some period, today's neutralino density can be precisely predicted using 
the Boltzmann equation. This density is the relic remaining after thermal freeze-out. 
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when the neutrahno annihilation rate vanishes due to the expansion of the universe. The 
rehc density predicted strongly depends on the masses and compositions of all particles 
of the spectrum. 

The latest release of the dark matter relic density measured by WMAP7 is 0.h1^ = 
0.1126 lb 0.0036 [29|. One can know where this constraint is satisfied in the plan 
(mi/2)"i.o) by looking at the lines in plots of Fig. [l| This figure will be described 
in details below. Experimental uncertainty, by construction, does not appear in the 
plots. Lines are set wide only to ease the reading of the color code. Typically, in the 
cMSSM, the dark matter relic density predicted is a bit large compared to the obser- 
vation. To reproduce this experimental constraint, it is necessary that one or several 
processes of neutralino annihilation be particularly efficient [30] . There are at least four 
such processes in the cMSSM. 

Firstly, the neutralinos can annihilate through the exchange of a scalar. But this 
mechanism is only efficient for light sparticles, which are more and more excluded by the 
LHC. Secondly, the annihilation through the exchange of a Higgsino or SU{2) gaugino 
is efficient if the mixing with those states is large enough. This happens near the "No 
EWSB" zone. Thirdly, a coannihilation with a slepton may dominate if it is close from 
the neutralino mass, and if both particles are not too heavy. This happens near the 
"Charged LSP" zone. Finally, the exchange of a CP-odd higgs is enhanced near the 
resonance pole, when m^o ~ ^rrij^o. This happens at large tan/3 and is dubbed "A-pole 
funnel" . 

But relying on the efficiency of such processes to obtain the correct value for Qh'^ 
requires a rather precise adjustment of parameters. It is therefore an act of fine-tuning. 
So if one wants to explain dark matter by the neutralino, one ends up with two natural- 
ness problems, one induced by the piece of information rriz = fnzex and the other due 
to $7 = Vtex- To study fine-tuning in the cMSSM, it is therefore the common sensitivity, 
Q/j2 which must be used. From the sensitivity point of view, one has to consider 
the set of fundamental parameters pi = {mi/2,iTiQ, Aq, fj,, B^). All of those parameters 
are defined at the GUT scale. They all have a logarithmic prior as objective density. 
On the other hand, although Q = pcdm/ Pc '^s a density rescaled to be made dimension- 
less, PC DM is dimensionful, so it necessitates a logarithmic prior as well. The common 
sensitivity measure is therefore 



mz, 



dlogmz ^ (91ogf]/i^ 
d log Pi d log Pi 



(45) 



We assume that the experimental uncertainties are sufficiently small, such that Eq. (45) 
holds for all the parameter space. This sensitivity will be denoted as C from now on. 

In the MSSM, the top quark mass is given by mt = ytv sin (3. Thus rigorously, yt 
should not be taken as a constant, since what we explained in Subsection 4.4 about the 
top Yukawa does not hold here due to the presence of sin f3. To stay exact, it would 
be necessary to consider the sensitivity associated to the three observables, Cmz,^,mf 
However, the correction induced from adding the observable mt is small, because in the 
sin f3 contribution the derivative d log sin f3/d log yt is dominant over the other deriva- 
tives. Therefore we choose to work only with the observables mz, ^h?, and keep yt as 
a constant. 

We evaluated the dark matter relic density, the sensitivity C and the SUSY spectrum 
over slices of the parameter space of the cMSSM. Our analysis was realized using a modi- 
fied version of the spectrum calculator Sof tSUSY ^ interfaced with MicrOMEGAs2 . 4 [32] 
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to compute the dark matter relic density. In spectrum calculators, these are not fi and 
Bfj^ which are input parameters, but mz and tan /3. This is already taken into account 
in Sof tSUSY to compute the mz derivatives, but it has to be carefully considered when 
implementing the Qh"^ derivatives. The results obtained are presented in Fig. [!} 

Figure [l] shows slices of the parameter space with Aq = 0, sign(/i) = 1 and rrit = 
172.4 GeV, for tan /3 = 10 and 50. The low mass region of the parameter space is 



increasingly excluded by the LHC bounds on sparticle masses (see 26 , 27] ) . Instead 
of showing a particular limit, we prefer to plot the gluino and lightest squark masses, 
and leave the choice to the reader to apply his preferred bound. All the plots display 
logarithm of the sensitivity. Following our results of Section [4j the difference between 
two points 1 and 2 is given by | log C2 — log Ci | = A log C, which has to be interpreted 
on the basis of Jeffreys' scale. That is, AlogC = 1, 2.5, 5 correspond to weak, moderate 
and strong evidence in favour of point 1 , respectively. The statements about naturalness 
that we will make when discussing the plots are based on this scale. 

Plots in the upper line show maps of electroweak fine-tuning. The fact that log Cmz 
drops down near the "No EWSB" zone is due to a feature of the MSSM renormalization 
group equations, known as the mechanism of "focus point" fSS^. In short, the low scale 
value of the Higgs soft mass mj^^ becomes generically small in this region, such that 
the cancellations required to reproduce the Z mass are less important. This feature also 
implies a large higgsino fraction for the neutralino, so that the experimental value of 
O/i^ can be reproduced. In this zone of the parameter space, the predictions of mz and 
Q/i^ are therefore particularly correlated by the model. 

Plots in the center line show the dark matter fine-tuning. In the tan /3 = 10 slice, 
one can see that the coannihilation region has a very strong fine-tuning compared to 
the focus point region. Formally, the coannihilation region continues all along the " 
charged LSP zone " with an increasing fine-tuning, but points are so fine-tuned that the 
numerical analysis does not render them. In the tan /3 = 50 slice, one can see that the 
A-po\e funnel and the focus point region have sensibly the same naturalness. Relative to 
the tan/3 = 10 focus point, these regions have a weak to moderate fine-tuning. On the 
other hand, they are strongly more natural than the tan/3 = 10 coannihilation region. 
At tan /3 = 50, some very fine-tuned coannihilations can also occur on the border, but 
are not shown on the plot. Dark matter fine-tuning has been previously investigated in 
the literature, see e.g. [3,34 , with slightly different definitions for Cq. 



Finally, plots shown in the lower line are for the combined electroweak and dark 
matter fine-tuning measure logCmz,n- Compared to the dark matter fine-tuning alone, 
here the C measure increases with Mx/2 due to the gauge hierarchy problem. In the 
tan /3 = 10 slice, the coannihilation region is still strongly fine-tuned with respect to the 
focus point region. In the tan/3 = 50 slice, the fine-tuning of the focus point region 
increases by AlogC « 5 between = 500 GeV and 1500 GeV. At low mi/2, this 

region is the most favored. At high mi/2, it is moderately fine-tuned compared to the 
focus point region at tan /3 = 10. The ^-pole funnel and the focus point region at high 
Ml/2 are only moderately preferred to the tan/3 = 10 coannihilation region. 



6 Conclusion 



The degree of naturalness is often intuitively defined as a sensitivity, although this 
approach suffers from several conceptual flaws. We propose a different definition to 
formalize naturalness, working in the framework provided by Bayesian statistics. This 



22 





Charged LSP 



I 



I 



1000 
"2-1/2 



5000 r 






4500 


No EWSB 


4000 






3500 
3000 


1 


/ 


C 2500 
2000 


kl 




1500 






1000 






500 








1000 
"'1/2 




1000 1500 



Charged LSP 

2000 




Figure 1: Quantified naturalness in the cMSSM. All of these plots are for ~ 0, sign(/i) = 1 
and nit = 172.4 GeV, with tan/3 = 10 and 50 for left and right panels, rnxji and toq are given 
in GeV units. 

Top row: Maps of the logarithm of the electroweak fine-tuning measure Cmz i normalized to the 
point of minimal fine-tuning. Center row: Maps of the logarithm of the dark matter fine-tuning 
measure Cq. Measure on both plots has the same normalization. Bottom row: Maps of the 
logarithm of the combined electroweak and dark matter fine-tuning measure Cmz,o.- Measure 
on both plots has the same normalization. 

Blue and dark blue isolines show the mass of the gluino and the lightest squark with steps of 
500 GeV. Following Jeffreys' scale, the relative degree of belief between two points 1 and 2 is 
given by | log(C2/Ci)|, such that threshold values 1, 2.5 and 5 correspond to weak, moderate 
and strong evidence for point 1, respectively. 



23 



approach is self-consistent, and interestingly, turns out to embed the usual sensitivity 
definition in a generalized form. 

So our approach is not an alternative. It appears that the sensitivity is actually 
a piece, intuitively guessed, of a larger setting. It is not consistent when taken alone, 
but the flaws find an explanation once the embedding in the Bayesian framework is 
done. Somehow, the essential missing piece was the notion of prior volume, which is 
also intuitive on its own. In this paper, we work out the consistent framework bringing 
together these notions. 

The naturalness measure which appears in this framework is a Bayes factor. The 
link between the naturalness measure and our degree of belief, which was missing so 
far, is therefore automatically provided by Jeffreys' scale. The generalized sensitivity 
which emerges takes into account the fine-tuning of an arbitrary number of correlated 
observablcs. We discussed in details the two observable case. 

By studying the Bayes factor involving a 'puzzle' model, we found that either the 
principle of indifference or consistency of the measure are setting the functional form of 
all quantities. For the sensitivity, it entails that these are the objective prior repartition 
functions, both for parameters and observablcs, which appear in the derivatives. As the 
'puzzle' model is a reference in terms of sensitivity, this Bayes factor gives a handle on 
the absolute interpretation of C. 

The Bayesian approach resolves without ambiguity the question of whether or not 
the top Yukawa should enter in the gauge-hierarchy measure C^^. Also, it accounts for 
the "second order fine-tuning" , which is induced when it is necessary to adjust precisely a 
parameter to select a zone with small fine-tuning in the parameter space. Consequences 
of recent LHC searches are also discussed. 

We present a simple illustration of our results by examining the naturalness of a su- 
persymmetric model, the cMSSM. The sensitivity formulas associated to the electroweak 
scale and dark matter relic density, taken separately or together, are well-defined, and 
differ from some work in the literature. By using Jeffreys' scale, we make statements 
about naturalness of the different dark matter annihilation regions. Roughly speaking, 
the focus point region is the winner of the naturalness comparisons, while the coannihi- 
lation region comes last with a strong evidence gap. 
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Appendix 



A Naturalness problems in particle physics and cosmology 

In this Appendix, we recall some of the main naturalness problems. These are the most 
commonly discussed, but the list is not intended to be exhaustive. 

• Gauge hierarchy problem [25]: The electroweak scale, often represented by the 
Z boson mass m^, is 0(100 GeV). On the other hand, the Planck scale Mpi = 
^/hcJSTrGN = 2.4 X 10^® GeV, sets the scale at which the theory of quantum 
gravity appears. Why is the electroweak scale so small compared to Mpi, while it 
should receive 0{Mpi) quantum contributions? 

• Strong CP puzzle [35]: From neutron electric dipole measurement, one deduces 
that the angle, contributing to the QCD lagrangian Cqcd ^ —^/^g^G^^^G^i, + 

is very small, 9 < 10^^^, while it could take values in [— 7r,7r]. 
Then, why is it so close to zero? 

• Flavour puzzle [36]: Ratios of successive SM fermion mass eigenvalues, as well as 
CKM angles, are all roughly of the same order. Why do they follow this particular 
structure? 

• Cosmological constant problem f37]: The cosmological constant A, which ap- 
pears in Einstein's equations TZ^i, — \g^uT^ = ^T^G^T^y + A^^i,, is estimated to 
be 0(10"'^^) GeV by fitting the Standard Cosmological Model to CMB, large 
scale structures, supernovae data. Within quantum field theories, it should receive 
0{Mj,i) contribut ions, (or O(Mjjj^y) if there is SUSY). Then why is it so small? 

• Flatness problem [38]: In the Standard Cosmological Model, the curvature of 
universe is given by = H'^{p/ p^ — l), where H is the Hubble constant, p is the 
total energy density contained in the universe, and pc = SH"^ /8ttGj\[. p/pc — 1 is 
estimated to be less than 0.01, and O(10~®^) at the Planck era. Why the universe 
had such a small curvature? 

• Cosmic coincidence ^9j: Why are the densities of matter and vaccum energy of 
same order of magnitude, i.e. pM ~ PA? And why now? 
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